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ABSTRACT 

This paper proposes a data tree-rewriting framework for 
modeling evolving documents. The framework is close to 
Guarded Active XML, a platform used for handling XML 
repositories evolving through web services. We focus on au- 
tomatic verification of properties of evolving documents that 
can contain data from an infinite domain. We establish the 
boundaries of decidability, and show that verification of a 
positive fragment that can handle recursive service calls is 
decidable. We also consider bounded model-checking in our 
data tree-rewriting framework and show that it is NexpTime- 
complete. 
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1. INTRODUCTION 

From static in house solutions, databases have become 
more and more open to the world, offering e.g. half-open ac- 
cess through web services. As usual for open systems, their 
design requires a careful static analysis process, helping to 
guarantee that no malicious client may take advantage of 
the system in a way for which the system was not designed. 
Static analysis of such systems very recently brought to- 
gether two areas - databases, with emphasis on semi-structured 
XML data, and automated verification, with emphasis on 
model-checking infinite-state systems. Systems modeling 
dynamical evolution of data are pretty challenging for auto- 
mated verification, as they involve feedback loops between 
semi-structured data, possibly with values from unbounded 
domains, and the workflow of services. If each of these top- 
ics has been studied extensively on its own, very few papers 
tackle decidability of algorithms when all aspects are present 
at the same time. 

An interesting platform emerged recently for using XML 
repositories evolving through web services, namely Active 
XML (AXML) g]. These are XML-based documents that 
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evolve dynamically, containing implicit data in form of em- 
bedded service calls. Services may be recursive, so the evo- 
lution of such documents is both non-deterministic and un- 
bounded in time. A first paper analyzing the evolution of 
AXML documents considered monotonous documents [3]. 
With this restriction, as soon as a service is enabled in an 
AXML tree T, then from this point on the service cannot 
be disabled, and calling it can only extend T. In particular, 
information cannot be deleted and subsequent service calls 
return answers that extend previous answers. Recently, a 
workflow-oriented version of AXML was proposed in [5] : the 
Guarded AXML model (GAXML for short) adds guards to 
service calls, thus controlling the possible evolution of active 
documents. Decidability in co-2NexpTime of static analysis 
for recursion-free GAXML w.r.t Tree-LTL properties was 
established in [5]. The crucial restriction needed for decid- 
ability there is a uniform bound on the number of possible 
service calls. Compared to [3], service invocation can termi- 
nate, and more importantly, negative guards can be used. 
Even more importantly, verification tasks are more complex 
and challenging because of the presence of unbounded data. 
However, the model relies on a rigid semantics of what a 
service call can do, and how (using a workspace etc). For 
instance, deletion of data is not possible. 

In this work, our aim is twofold. First, we aim at ex- 
tending the GAXML model in a uniform way, by expressing 
the effect of embedded service calls in form of tree rewriting 
rules. Our model DTPRS (data tree pattern rewriting sys- 
tems) is based on the same basic ingredients as GAXML, 
which are tree patterns for guards and queries. However, 
our formalism allows a user to describe several possible ef- 
fects of a service call: materialization of implicit data like 
in (G)AXML, but also deletion and modification of existing 
document parts. This model is a simplified version of the 
TPRS model proposed in [15] , but it can additionally handle 
data from infinite domains. 

Our second, and main objective is to get decidability of 
static analysis of DTPRS without relying on a bound on the 
number of service calls. For doing that, we use a technique 
that emerged in the verification of particular infinite-state 
systems, such as Petri nets and lossy channel systems. The 
main concept is known in verification as well-structured tran- 
sition systems (WSTS for short) [H[T3]. WSTS are one ex- 
ample of infinite-state systems where (potentially) infinite 
sets of states can be represented (and effectively manipu- 
lated) symbolically in a finite way. In contrast, [5] uses a 
small model property which implies an enumeration of trees 
up to some bound. 



Our basic objects are data trees, i.e., trees with labels from 
an infinite domain. We view data trees as graphs, and define 
in a natural way a well-quasi-order on such graphs. Then we 
show that a uniform bound on the length of simple paths in 
such graphs, together with positive guards, makes DTPRS 
well-structured systems (TJ [T3]. As a technical tool we use 
here tree decompositions of graphs. In a nutshell we trade 
here recursion against positiveness, since considering both 
leads to undecidable static analysis. We show that for pos- 
itive DTPRS, termination and tree pattern reachability are 
both decidable. Furthermore, we show that bounded model- 
checking of (not necessarily positive) DTPRS is NexpTime- 
complete. On the other hand, we show that the verification 
of simple but non positive temporal properties is undecid- 
able even for positive DTPRS. 

Related work: Verification of web services often ignores 
unbounded data (c.f. e.g. [171 114]). On the other hand, 
several data-driven workflow process models have been pro- 
posed. Document-driven workflow was proposed in |20| . 
Artifact-based workflow was outlined in [15], in which ar- 
tifacts are used to represent key business entities, including 
both their data and life cycles. An early line of results in- 
volving data establishes decidability boundaries for the ver- 
ification of temporal (first-order based) properties of a data- 
driven workflow processes, based on a relational data model 
1101 112] , This approach has been recently extended to 
the artifact-based model 

On the verification side, there is a rich literature on the 
verification of well-structured infinite transition systems [TJ 
113] . ranging from faulty communication systems [7] to pro- 
grams manipulating dynamic data 2 (citing only a few re- 
cent contributions). The latter work is one of the few exam- 
ples where well-quasi-order on graphs are used. 

Organization of the paper: In the next section, we fix some 
definitions and notations, define the DTPRS model, and il- 
lustrate how to reduce GAXML to our DTPRS model. Then 
in Section 3, we describe an example to illustrate the expres- 
sivity of DTPRS. In Section 4, we show that DTPRS with 
recursive DTD or negated tree patterns are undecidable. In 
Section 5 we define positive DTPRS and prove our decid- 
ability results. On the other hand, we show the undecid- 
ability of the verification of general, non-positive temporal 
properties in Section 6. Finally in Section 7, we consider 
bounded model-checking of (not necessarily positive) DT- 
PRS and show that the bounded model-checking problem 
is NexpTime-complete. Omitted proofs can be found in the 
appendix. 

2. DEFINITIONS AND NOTATIONS 

In this paper, documents correspond to labeled, unranked 
and unordered trees. Fix a finite alphabet E (with symbols 
a, b, c, . . ., called tags) and an infinite data domain T>. A data 
tree is a (rooted) tree T with nodes labeled by SUP. A data 
tree T can be represented as a tuple T = (V, E, root, £), with 
labeling function £ : V — >■ E U T>. Inner nodes are E-labeled, 
whereas leaves are (E U D)-labeled. We fix a finite set of 
variables X (with symbols X,Y, Z, . . .) that will take values 
in T>, and use * as special symbol standing for any tag. Let 
T denote E U X U {*}. 

A data constraint is a Boolean combination of relations 
X = Y, witrfl X,Y €X. 

1 For simplicity we disallow here explicit data constants X = 



A data tree pattern (DTP) P — (V, E, root, £, r, cond) is a 
(rooted) T-labeled tree, together with an edge-labeling func- 
tion t : E — > {|, ||} and a data constraint cond. As usual, 
|-labeled edges denote child edges, and ||-labeled edges de- 
note descendant edges. Internal nodes are labeled by Eu{*}, 
and leaves by T. A matching of a DTP P into a data tree 
T is defined as a mapping preserving the root, the E- and 
D-labels (with * as wildcard), the child- and the descendant 
relations, satisfying cond and mapping A"-labeled nodes to 
D-labeled ones. In particular, a relation X = Y (X, Y 6 X) 
means that the corresponding leaves are mapped to leaves 
of T carrying the same data value. If the mapping above is 
injective, then it is called an mjective matching of P into T. 

A relative DTP is a DTP with one designated node self. 
A relative DTP (P, self) is matched to a pair (T,v), where 
T is a tree and v is a node of T. 

We consider Boolean combinations of (relative) DTPs. 
The patterns therein are matched independently of each other 
(except that nodes designated by self must be matched to 
the same node of T), and the Boolean operators are inter- 
preted with the standard meaning. 

A data tree pattern query (DTPQ) is of the form body 
head, with body a DTP and head a tree such that 

• the internal nodes of head are labeled by E and its 
leaves are labeled by (E U T> U X), 

• every variable occurring in head also occurs in body, 

• there is at least one variable occurring in head, i.e., at 
least one leaf of head is labeled by X. 

Let T be a data tree and Q — body ~+ head be a DTPQ. 
The evaluation result of Q over T is the forest Q(T) of all 
instantiations of head by matchings /j, from body to T. A 
relative DTPQ is like a DTPQ, except that its body is a 
relative pattern. 

A locator is a relative DTP L with additional labels from 
the set {append, del} U {ren a | a £ E}. The labels append 
and ren a are not exclusive and can be attached only to nodes 
of L that are labeled by a tag (that is by E U {*} but not by 
T> U X). Nodes not labeled by append, ren a can be labeled 
by del (even data nodes), such that the descendants of a 
node labeled by del axe labeled by del, too. The intuition 
behind this definition is to provide a context for data tree 
rewriting rules, together with some possible actions on this 
context: renaming, deletion, appending. 

A data tree pattern rewriting rule (DTP rule) R is a tuple 
(L,G,Q,F, X ) with: 

• L is a locator, 

• G is a Boolean combination of (relative) DTPs (the 
guard of R), 

• Q is a finite set of relative DTPQs, 

• T is a finite set of forests with internal nodes labeled 
by E and leaves labeled by E U V U X U Q, 

• x i s a mapping from the set of nodes of L labeled by 
append to T . 

A DTP rewriting system (DTPRS) is a pair (1Z, A) con- 
sisting of a set 1Z of DTP rules and a static invariant A, 

d (d £ T>): they can be simulated by tags from E. 



consisting of a DTD and a data invariant, i.e. a Boolean 
combination of DTPs. We assume that the static invariant 
A is preserved by the rewriting rules TZ. As usual for un- 
ordered trees, a DTD is defined as a tuple (E r , V) such that 
E r is the set of allowed root labels, and V is a finite set of 
rules a — > if) such that a G E and ip is a Boolean combina- 
tion of inequalities of the form |6| > k, where 6 G E U {dom} 
(dom is a symbol standing for any data value), and k is 
a non-negative integer. A positive DTD is one where the 
Boolean combinations above are positive. 

We now define formally the semantics of a transition by 
DTP rules. So let T = (V, E, root, I) be a data tree (with 
T|=A) and let R = (L, G, Q, F, %) be a DTP rule. 

• Let fj, be an injective matching from L to T. Let v be 
the assignment of data values to variables in L such 
that v(X) = £(fi(v)) for every v labeled by X G X in 
L. 

• For each variable X G X we denote its evaluation as 
X(T), with X(T) = v(X) if defined, and X(T) a fresh 
data value otherwise. Here a fresh data value is a 
data value which does not appear anywhere else in T. 
Furthermore, it is required that all the new variables 
of R, i.e. variables occurring in F, but not in L, should 
take mutually distinct fresh values. For each forest 
F 6 J, we denote its evaluation by F(T), by replacing 
labels Q G Q by Q(T) and labels X € X by X{T). 
Recall that all queries Q G Q are evaluated relatively 
to fi(self). 

• A data tree T' can be obtained from T by 

— deleting subtrees rooted at nodes /j,(v) whenever 
v is labeled by del in L, 

— changing the tag of a node fi(v) to a whenever v 
is labeled by ren a in L, 

— appending F(T) as a subforest of nodes y,{v) when- 
ever v is labeled by append in L and x( v ) — F, 

— every other node of T keeps its tag or data. 

• The rule R is enabled on data tree T if there exists 
an injective matching /j, of L into T such that (1) G is 
true on (T,/j,(v)) with v labeled by self in L, and (2) 
there is a data tree T' , obtained from T and /j, by the 
operations specified above, satisfying T'|=A. 

Let T — T' denote the transition from T to T' using 
DTP rule R G TZ. 

Remark 1. 1. The injectivity of the matching /i en- 
sures that the outcome T' is well-defined. In particu- 
lar, no two nodes with label del and append (or ren a ), 
resp., can be mapped to the same node in the data tree. 
However, mappings used for guards or queries are - as 
usual - non injective. 

2. For the new variables occurring in T , but not in L, 
we choose mutually distinct fresh values. We could 
have chosen arbitrary values instead, and enforce the 
fact that they are fresh and mutually distinct a poste- 
riori using the invariant A. In this case, the invariant 
needs negation. The invariant (or the locator) can be 
also used to enforce that the (arbitrarily) chosen val- 
ues already occur in T. This kind of invariant would 
be positive. 



3. In our definition of DTP rules, it might appear that 
guards are redundant wrt. the locator. But notice that 
this only concerns positive guards. 

Given a DTPRS (11, A), let T — >T denote the union of 

T — ^ T" for R G TZ, and T T (or T T') denote the 
transitive (or reflexive and transitive) closure of T — > T' . 
Moreover, let (T) denote the set of trees that can be 
reached from a data tree T by rewriting with DTP rules 
from U, i.e. 7£(T) = {T' | T -H> T'}. For a set of data 
trees I, let 7^(1) be the union of Tfi(T), for Tel. 

We are interested in the following questions, given a DT- 
PRS (TZ, A): 

• Pattern reachability: Given a DTP P and a set of ini- 
tial treefl Init, given as the conjunction of a DTD 
and a Boolean combination of DTPs, is there some 
T G Tfl{Init) such that P matches T? 

• Termination: Given an initial data tree Tb, are all 
rewriting paths To — > T\ — > ■ ■ ■ starting from To finite? 

The reason for the fact that termination of DTPRS is de- 
fined above w.r.t a single initial data tree is that termination 
from a set of initial trees is already undecidable without data 
(see Section |4| . 

2.1 Reduction from GAXML to DTPRS 

DTPRS is a quite powerful model, which allows to model 
e.g. Guarded Active XML (GAXML) [5j. We show this 
translation on two main GAXML steps: call and return of 
services. For completeness, we briefly recall the main fea- 
tures of GAXML here. AXML trees contain embedded func- 
tion calls that are evaluated (via tree pattern queries) in a 
workspace. GAXML adds call and return guards, that con- 
trol the function call and the return of the result (as sibling 
of the call node). Functions can be internal or external. The 
external ones return some arbitrary forest that is consistent 
with the static invariant A. 

We describe here how to model an (internal) function call 
in GAXML with DTPRS. Let / G E be a function associ- 
ated with the argument query Q and the call guard G. The 
associated DTP rule has the same guard G, the set of queries 
Q = {Q}, and T = {Tf,Tx} is the set defined below: 

• Tf is a tree with three nodes, the root being labeled 
by /, and the two leaves labeled by the query Q and 
by the variable X, respectively. 

• Tx is a tree with a unique node labeled by the variable 
X, 

The locator L is given in Figure [T] Finally, \ maps the 
!/-node to T x and the WS-node to Tf. Applying the DTP 
rule amounts in evaluating Q to get the arguments of the 
call, writing them into the workspace WS, and creating a 
fresh identifier X that it copied both below WS and below 
the node with the function call (aka return address for /, 
see below). 

We describe now how to model the return of an (internal 
continuous) function / G E associated with the return query 
Q and the return guard G. The associated DTP rule has the 
same guard G, the set of queries Q = {Q}, F = {Tq} has 

2 We require that every tree in Init satisfies A. 



( root ) 

// (WS) append(T f ) 

Self 

( If ) ren? / ,append(Tx) 

Figure 1: The locator for service invocation. 

a unique tree with a unique node labeled by the query Q, 
and the locator is given in Figure[3]with the data constraint 
A = Y: 




Figure 2: The locator for service return. 



Finally, \ maps the node labeled append with Tq. This 
DTP rule locate where the call was computed in the workspace 
WS using A, evaluates Q to get the return of the call, puts 
the result of the return query as a sibling of ?/, and deletes 
the associated data in the workspace, as well as the identifier 
X. 

3. MAILORDER EXAMPLE 

The following is a DTPRS description of the basic func- 
tionalities of a Mailorder system for Play. com. For simplic- 
ity, we represent only what happens on the Play.com peer, 
although we could also model client peers, bank peers etc. 

The Play, com example can be compared with the Mailorder 
example in [S]. Syntactically, GAXML uses guards and 
queries. Most of the time, guards and queries are very simple 
and can be encoded in the locator of DTP rules. In this case, 
we omit the self label in our rules. Unlike Mailorder we can 
express deletion with DTPRS, and thus model the selection 
of the products in the cart (adding and deleting products), 
and also handle an inventory (how many items of a prod- 
uct remain - each time an item is added to a cart, it is also 
deleted from the inventory). More importantly, compared 
with the recursion-free decidable restriction of GAXML, we 
are able to represent the process of many customers ordering 
many different products in our decidable fragment. 

On the Play, com peer, there are a product catalog, a cus- 
tomer catalog, a set of carts and a set of orders. The in- 
ventory is encoded in the product catalog: if there are three 
items of a product in the inventory, then there are three 
tokens as children of the product. Each cart is associated 
with a customer (at first anonymous, and he can later login 
in as a registered member). The cart is first in the select 
mode, which allows the associated customer to add/delete 
products. Then the customer can check out, the cart gathers 
the different prices for the products into a bill, and goes into 
payment mode. When the customer pays, a corresponding 
order is created with a receipt, the customer is disconnected 
and the cart is deleted. 

A simple example of a configuration of Play.com is illus- 



trated in Figure [3] We represented a data value only when 
it is used by at least two different nodes. 




Figure 3: A configuration of the Play.com system, 
where the registered customer Blaise has twice the 
same product in his cart, with PID 9221. There is 
one additional product with PID 9221 left. 



Some key rewriting rules are described in the following. 
We only describe nontrivial components in these rules. Tree 
patterns are represented below in term form, with descen- 
dant edges preceded by a — symbol. 

• An anonymous customer can connect to Play.com with 
the rule create-cart. 

- L - [Play, com append(F) ] 

- F:= [Cart]([nolog]([CId](A)), [products], [select]). 
Notice that here A" is a fresh data value. 

• An anonymous customer can login as a registered mem- 
ber with the rule login. As the customer peer is not 
modeled, we do not handle the check of a password, 
although it would not be a problem to do so in our 
framework. 

- L := [Play, com] (- [Customer] ([CId] (A)), 
[Cart append(F) ] ( [nolog dei ] ) ) 

- F := [log] ([CId] (A)). Here, A is the same PId 
for the cart and for the customer. 

• The rule Add-Product adds a new product into the cart 
(and deletes a token from the inventory) . 

- L := [P/ay.com]([Cart]([products oppend(F) ]), 

- [Product] ( [PId] (A) , [tokens])) 

-F:= [PId] (A). 

• The rule Delete-Product deletes a product from the 
cart (and puts the token back) . 

- L :— [P/ay.com]([Cart]([select],-[PId de i](A)), 

- [Product append(F) ]([PId](X))) 

- F := [token]. 

• The rule Check-out checks whether the cart is nonempty 
and retrieves the prices of products in the cart into a 
bill through a query. It changes the mode of the cart 
from select to payment. 

- L := [Play.com]([Ca,rt self}append(F) ](-[PId], 
[select mn payment ])) 



— F := [Bill] (Q), and Q := body ~» Y is the query 
with body = [PZm/. com] ( [Cart sc(f ](-[PId](X)), 

- [Product] ([PId](X), [Price](Y))). 

• The customer can pay with the rule Pay, and a cor- 
responding order is created. For simplicity, it discon- 
nects the customer, and transforms the cart into an 
order. The order contains the customer ID, an order 
ID (a fresh unique identifier), and the total price (sum 
of the prices of each items). As we model prices by 
data values and we do not use any arithmetics, the 
total price is a fresh data value. The only important 
thing is that this data value Total is the same as the one 
registered in the bank account, which we could check 
for equality (although we do not explicitly model the 
bank here). The order does not recall the individual 
PID of products since there will be no more products 
to put back in the inventory (and anyway the product 
can be later removed from the catalog). 

— L := [Play.com]([Ca,Tt appendF: ren 0ldaT ]([Bi\ldei], 
[products^,], [payment re „ pa . d ], [log rer>cust ])) 

— F := [Receipt] ( [Old] (X), [Total] (V)) 

• The rule Add-member allows a customer to register as 
a member. 

— L := [Play.com]([CCatalog append(F) ]) 

— F :— [Customer] ([CId](X), [Name](F), [Email](Z)) 

We do not specify the following rules here, which are easy 
to come up with: shipped, delivered, add product to catalog 
etc. 

Notice that this example is intentionally not correct. In- 
deed, as there is no check of the state in which the cart is at 
rule Add-Product, it is possible that a bill is produced after 
a check out, and the customer can still add products with 
the aforementioned rule which will never be accounted for 
in the bill. We will show later that we can decide whether 
such a problem occurs or not. To fix this problem, it suffices 
to check in the locator L of Add-Product rule that the cart 
is in the select mode. 

4. UNDECIDABILITY 

As one might expect, analysis of DTPRS is quickly un- 
decidable - and sometimes already without using any un- 
bounded data. The proof of the proposition below is ob- 
tained by straightforward simulations of 2-countcr machines. 

Proposition 2. Both termination and pattern reachabil- 
ity for DTPRS (1Z, A) are undecidable whenever one of the 
following holds: 

1. the DTD in A is recursive, 

2. either guards in 1Z or the invariant A contain negated 
DTPs. 

The above result holds even without data. 

The next result shows that with data, we can relax both 
conditions above and still get undecidability of DTPRS. The 
main idea is to use data for creating long horizontal paths 
(although trees are supposed to be unordered). Such hor- 
izontal paths can be obtained e.g. with a tree of depth 2, 



having subtrees of height one with 2 leaves each, say labeled 
by data values di,di+i. Assuming all d; are distinct (and 
distinguishing di) we get a linear order on these subtrees. 

Theorem 3. Both termination and pattern reachability 
are undecidable for DTPRS (11, A) such that (1) the DTD 
in A is non-recursive and (2) all DTPs in guards from 1Z 
and the invariant A are positive. 

Proof. We reduce Post correspondence problem (PCP) 
first to pattern reachability. We may assume that our PCP 
instance (ui,Vi)i<i< n is such that the following holds for 
every non-empty sequence ii, . . . ,ik of indices: 

• Either U = Ui 1 ■ ■ ■ Ui k , V = Vi ± • • • Vi k are incompara- 
ble, or V is a prefix of U. In the latter case we call 
(U, V) a partial solution. 

• If (U, V) and (Uut, Vvi) are partial solutions and U =fc 
V, then either Uui — Vvi or Vvi is a prefix of U. 

• Every solution starts with with the pair (iti,i>i) and 
ends with («„ ,v n ). 

It is not hard to verify that the usual Turing machine reduc- 
tion to PCP satisfies the restrictions above. 

A partial solution (U, V) with U = ai • • • o„, V = oi • • • a m _i 
will be represented by the data tree below. In this data 
tree the leaves are labeled by data di, with di 7^ dj for all 
i ^ j. Moreover, notice that the last position has the spe- 
cial marker $, and the first position in U without V has the 
special marker #. 
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With each PCP pair (m,Vi), i < n, we associate DTPRS 
rules Ri = (L, G, Q, T , x). For simplicity we describe below 
the locator L and the forest T for (ui,Vi) — (aba,bb) (the 
guard G and the query set Q are both empty): 




Xi X2X2 X3 A3 X4 A5 Xa 



We need a rule Ri for each pair of tags c, d £ S (these 
are the tags at positions n and m + 2 in the example with 
\vi\ — 2). The forest F(aba) which will be added under the 
root node will contain 3 trees T\,T2,T-s, with roots labeled 
a, b and a, $, respectively. Tree Ti has two leaves, labeled 
Xq and X-j. Tree T2 (T3, resp.) has two leaves, labeled X7 
and X$ (X8 and Xg, resp.). Notice that variable Xq occurs 
in both L and F(aba), whereas X7,Xg,Xg will take fresh 
(and mutually distinct) values. 

The pair (u n , v n ) has similar rules, except that we will not 
append any forest to the root, but rename the root with a 
special marker . The initial tree To is defined as expected, 
from (tti,vi). The PCP instance has a solution iff we can 
reach a data tree with root label . Notice that all guards 
and the invariant are empty. 



For termination we can modify the above proof in order 
to ensure that executions that do not correspond to partial 
solutions, are infinite. More precisely, if U, V as above is a 
partial solution, but Uui, Vvi is not, then we use a DTP rule 
associated with (v,i,Vi) that forces an infinite execution. In 
this way, termination will hold iff the PCP instance has a 
solution. □ 

We end this section with a remark on the decidability of 
termination from an initial set of trees. First we notice that 
- already without data - DTPRS can simulate so-called re- 
set Petri nets [15]. These are Petri nets (or equivalently, 
multi-counter automata without zero test) with additional 
transitions that can reset places (equivalently, counters) to 
zero. They can be represented by trees of depth 2, where 
nodes at depth one represent places, and their respective 
number of children (leaves) is the number of tokens on that 
place. A DTPRS (without data) can easily simulate incre- 
ments, decrements and resets (using deletion in DTPRS). It 
is known that so-called structural termination for reset Petri 
nets is undecidable [18], i.e., the question whether there are 
infinite computations from any initial configuration, is un- 
decidable. This implies: 

Proposition 4. The following question is undecidable: 
Given a DTPRS (TZ, A), is there some tree To satisfying A 
and an infinite computation To — > Tl — > ■ ■ ■ in (TZ, A) ? 
This holds already for non-recursive DTD in A and without 
data constraints in DTPs. 

5. POSITIVE DTPRS 

In this section we consider positive DTPRS, a fragment 
of DTPRS for which we show that termination and pattern 
reachability are decidable. 

From Proposition [2] we know that in order to get decid- 
ability, the DTD in the static invariant A must be non- 
recursive. For a non-recursive DTD, there is some B such 
that every tree satisfying the DTD has depth bounded by B. 
In the following, we assume the existence of such a bound 
B. Also from Proposition [2] we know that for decidability 
we need to consider only positive guards and positive data 
invariants. 

However, from Theorem [31 we know that these restric- 
tions alone do not suffice to achieve decidability. We need 
to disallow long linear orders created with the help of data. 
For this, we introduce a last restriction, called simple-path 
bounded, which is defined in the following. 

Let T = (V,E,root,£) be a data tree. The graph G(T) 
associated with T is the undirected graph obtained by adding 
to V the set of data values occurring in T, and adding to 
E the links between a leaf labeled by a data value and the 
node representing that value (see also Figure [3]) . Formally, 
G(T) = (V ,E'), where V = VU {£(v) | £{v) £ V} and 
E' = E U {{v,d} | l(v) = d £ V}. A simple path of T is a 
simple path in G(T), i.e. a sequence of vertices V\, in 
G(T) such that for all i 7^ j, {vi,Vi+i} £ E' and Vi 7^ Vj. 
The length of a path vi, . . . ,v n is n — 1. 

Formally, a DTPRS (TZ, A) is a positive DTPRS with set 
of initial trees Init, if: 

• non-recursive-DTD: the DTD in the static invariant 
A is non-recursive. In particular, trees satisfying the 
DTD have depth bounded by some B > 0. 



• positive: all guards in TZ and the data invariant in 
A are positive Boolean combinations of DTPs. The 
DTD in A is positive as well. 

• simple-path bounded: there exists K > such that 
the length of any simple path in any T £ 7^ (To) for 
any To £ Init, is bounded by K. 

Notice that the third condition above implies that all data 
trees have depth bounded by K. So we always assume that 
B < K. Notice also that in positive DTPRS, the data value 
inequality is allowed in DTPs, that is, we can state that two 
data values are different. 

The Play, com example in Section [3] satisfies the first 2 
conditions above. However, in general, the third condition 
is not satisfied. PIds can create create a long path: a cart 
can be linked to a product, linked to another cart linked 
to another products etc. So the number of carts or the 
number of products needs to be bounded (unless a cart can 
contain at most one product). On the other hand, Name and 
Total are fresh data values, they cannot be used as links. At 
last, CId can be used in different carts and orders, but as a 
cart or order is associated to a unique customer, it cannot 
create long links. More formally, if the system can handle 
only G active carts at a time (but the number of orders 
is unlimited), then the system has simple paths bounded 
by 12G + 7. If there are at most D different products in 
the catalog, then the system has simple paths bounded by 
12D + 7. Finally, if each customer can have only one active 
cart at a time (but she can have many orders), and each cart 
has at most one product, then the system has simple paths 
bounded by 14. Any of these restrictions can be described 
using only positive rules. The rest of the section is devoted 
to the proof of the following result: 

Theorem 5. Given a positive DTPRS {TZ, A), the pat- 
tern reachability and the termination problem are decidable. 

We prove Theorem [5] by using the framework of well- 
structured transition systems (WSTS) [I] [13], which has 
been applied to DTPRS without data in [15]. We recall 
briefly some definitions. A WSTS is a triple (S, — s>, X) such 
that S is an (infinite) state space, X is a well- quasi- ordering 
(wqo for short) on S, and — > is the transition relation on 
S. It is required that — > is compatible w.r.t. X: for any 
s, t,s'sS with s — > t and s X s', there exists t 1 £ S such 
that s' — > t' and t X t'. 

Let 7s, k denote the set of data trees whose depths are 
bounded by B and lengths of simple paths are bounded by 
K. From the definition of positive DTPRS, we know that 
TZ{Init) C Tb,k- 

In the following, we show Theorem [5] by showing that 
(7b, it, — >, X) is a WSTS. 

We define the binary relation X on Tb.k- Let T\ — 
(Vi,Si,rooti,^i),T 2 = (V 2 ,E 2 , roots, £2) <E Tb,k, then Ti X 
T 2 if there is an injective mapping <f> from Vi to V 2 such that 

• root preservation: </>(rooti) = root2, 

• parent-child relation preservation: (v\,v 2 ) £ Ei iff 

(<t>(vi),<t>{v2))eE 2 , 

• tag preservation: If £i(v) £ E, then £i(v) — £ 2 (<j>(v)), 

3 A wqo X is a reflexive, transitive and well-founded relation 
with no infinite antichain. 



• data value (in)equality preservation: If vi,v 2 G Vi and 
l\(y x ),t\{vi) G V, then £ 2 (<f>(vi)), e 2 ((t>(v 2 )) G T>, and 
= £i(»a) iff 4W>(«i)) = 4(</>W). 

It is easy to see that X is reflexive and transitive, so it 
is a quasi-order. In the following, we first assume that X 
is a wqo on 7b, k and show that — > is compatible with X, 
in order to prove Theorem [5] We show in Section [5.31 that 
X is indeed a wqo: for any infinite sequence of data trees 
To, Ti, . . . G 7b, k, there are i < j such that Tj X Tj. 

5.1 Well-structure of positive DTPRS 

Let (1Z, A) be a positive DTPRS. 

Proposition 6. iet Tj,Ti,T 2 G 7b, k, Ti T 2 for 
some J? £ K, and Ti X T[. Then there exists T 2 G 7b, k 
such that T[ — ^> T 2 and T 2 X T 2 . 

Proof. Let i? = (L,G,Q,F,x)- Taking an injective 
mapping <f> : Ti —¥ T[ preserving the root, parent-child re- 
lation, tag, and data (in)equality relation, and an injective 
matching ip : L — > Ti satisfying the data constraint cond of 
L, we have an injective matching <p o tp ■ L — > T[ which re- 
spects the parent-child, tags and data (in)equality relation. 
Hence cond is satisfied by (p o tp too. As G is positive, if 
G is true at Ti wrt. cp, then it is true at T[ wrt. <p o ip as 
well. Applying the rule R to T-[ wrt. <p o ip, we get a tree 
T 2 such that T 2 X T 2 . As both the DTD and the data in- 
variant in A are positive and T 2 fulfills A, so does T 2 . Thus 

T{ A Ti. □ 

Consequently, we have shown that — > is compatible wrt. 
X in 7b, k, thus (7b, k, — >, X) is a WSTS. 

In the following, we prove that (7b, k, — >, X) satisfies 
some additional computability conditions, needed to show 
the decidability of pattern reachability and termination. 

First consider pattern reachability. To get the decidabil- 
ity of this problem, from Theorem 3.6 in [13] . we need to 
show that (7b, k, — X) has effective pred-basis. A WSTS 
(S, — >, X) has effective pred-basis if there exists an algo- 
rithm that computes for any state s G S the finite basis 
pb(s) of the upward closed set f Pred(f s). Here, f I = 
{s' G S | 3s G I s.t. s X s'} denotes the upward closure of 
I wrt. X, and Pred(I) = {s G S \ 3t G I,s — ► t} the set of 
immediate predecessors of states in/. A basis of an upward- 
closed set I is a minimal set I b such that / = [J xeI b t x - 
Recall that whenever X is a wqo, the basis I b of an upward 
closed set / is finite. 

Proposition 7. (7b, k, — >, X) has effective pred-basis. 

A solution for reachability of a given DTP P from an 
initial set of data trees Init is obtained by backward explo- 
ration: we start with 1° as the set of data trees matching 
P and satisfying A. Then compute iteratively the upward 
closed sets 7 n+1 = I n U (Pred(I n ) n A) by representing each 
set through its finite basis. Since the sequence I n is in- 
creasing by construction, and since X is a wqo, the sequence 
must be finite and termination can be effectively tested. If 
I" = I n+1 then it suffices to check whether I n n Init = 0. 
Notice that we did not impose any restriction on the set Init 
of the initial trees. We need to test the existence of a data 
tree from 7b, k satisfying an (arbitrary) Boolean combina- 
tion on DTPs and an (arbitrary) DTD. This problem is in 



general undecidable [8], but becomes decidable in the special 
case where trees are of bounded depth [8] . Here we need to 
talk in addition about trees from 7b, k, but we can apply 
the same proof ideas as in [8] in order to infer decidability. 

Now consider the termination problem. From Theorem 
4.6 in [13], to show the decidability of termination problem 
from a single initial tree To, it is sufficient to show that 

{Tk(To), — >, X) h as effective Succ, i.e. for each T G T^To), 
the set Succ{T) := {T' \ T — > T'} is computable. Then 
we can compute the finite reachability tree starting with To: 
we compute trees T s.t. To — > T and we stop whenever we 
find TXT' along some branch. 

It is not hard to see that Succ(T) contains only a finite 
number of equivalence classes induced by the quasi-order 
X. Since the DTPRS (TZ, A) is not able to distinguish be- 
tween two distinct data trees belonging to the same equiva- 
lence class, by selecting one data tree from each equivalence 
class, we can get a finite representation of Succ(T), there- 
fore, (7tz{To), — >-, X) has effective Succ. 

5.2 Tree Decompositions 

In order to prove that X is a wqo over 7b, k, we first rep- 
resent a data tree T as a (labeled) undirected graph Gi{T), 
then we encode Gi{T) into a tree (without data) of bounded 
depth using the concept of tree decompositions. Define a 
binary relation < on (labeled) trees of bounded depth as 
follows: Ti < T 2 if there is an injective mapping from Ti to 
T 2 preserving the root, the tags, and the parent-child rela- 
tion. It is known that < is a wqo on labeled trees of bounded 
depth without data [15] . 

Let Qk be the set of labeled graphs with the lengths of 
all simple paths bounded by K . We show that X on 7b, k 
corresponds to the induced subgraph relation (formally de- 
fined later) on Qk, and the fact that < is a wqo for labeled 
trees of bounded depth implies that the induced subgraph 
relation is a wqo on Qk- 

Given a data tree T = (V, E, root, I) G 7b, k, the labeled 
undirected graph representation Ge(T) ofT is obtained from 
G(T), the graph associated to T, by adding labels encoding 
information of data tree nodes (tag, depth . . .). Formally, 
Gt(T), is a ((EU {#}) x [B + 1]) U {$}-labeled (where [5 + 
1] = {0, 1, ••• ,B}) undirected graph (V',E',£') defined as 
follows, 

• V" = V U {£(v) | V <E V, £{v) G V}, 

• E' = E U {{v, d} | v G V, £(v) = deV}, 

• If £(v) G E, then i'(v) = {£(v),i), otherwise, £'(v) = 
(#,«), where i is the depth of v in T. In addition, 
£'(d) = $ for each d 6 V H T>. 

Let E G denote ((E U {#}) x [B + 1]) U {$}. 

For EG-labeled graphs, we define the induced subgraph 
relation as follows. Let Gi = (Vi, Eiji), G 2 = (V 2 ,E 2 ,£2) 
be two Eo-labeled graphs, then Gi is an induced subgraph 
of G 2 (denoted Gi C G 2 ) iff there is an injective mapping (f> 
from Vi to V 2 such that 

• label preservation: l\(v\) — (.2(4>( v i)) for any vi G Vi, 

• edge preservation: let vi,v[ G Vi, then {vi,v[} G E\ 
iff{0(«i),0(«i)}S£a. 

From the definition of the labeled graph representation of 
data trees, it is not hard to show that the induced subgraph 
relation C corresponds to the relation X on data trees. 



Proposition 8. Let T X ,T 2 £ Tb,k, then Ti X T 2 iff 
Gi(Ti) C G? 4 (Ta). 

Now we show how to encode any EG-labeled graph be- 
longing to Qk into a labeled tree of bounded depth by using 
tree decompositions. 

Let G — (V, E, £) be a connected Eg-labeled graph, then 
a tree decomposition of G is a quadruple T = (U, F, r, 9) such 
that: 

• (U, F, r) is a tree with the tree domain U, the parent- 
child relation F, and the root r £ U, 

• 9 : U — > 2 V is a labeling function attaching each node 
u € U a set of vertices of G, 

• For each edge {v,w} £ _E, there is a node u £ U such 
that {«, iu} C #(«), 

• For each vertex v £ V, the set of nodes u £ [/ such 
that u £ constitutes a connected subgraph of T. 

The sets 9(v) are called the bags of the tree decomposition. 

The depth of a tree decomposition T = ((7, F, r, 9) is the 
depth of the tree (U, F, r) and the width of T is defined as 
max{|#(u)| — 1 | it £ U}. The tree-width of a graph G = 
(V, E) is the minimum width of tree decompositions of G. 
For a tree decomposition of width if of a graph G, without 
loss of generality, we assume that each bag is given by a 
sequence of vertices of length K + l, Vq . . .Vk, with possible 
repetitions, i.e. possibly Vi = Vj for some i,j : i ^ j (tree 
decompositions in this form are sometimes called ordered 
tree decompositions). 

THEOREM 9 f[l9l[5]l. Lf G g Gk. then G has a tree 
decomposition with both depth and width bounded by K. 

Proof. Let G = (V,E,£) £ Qk and T = (V,E T ,r) be a 
depth-first-search tree of G with r £ V as the root. Then T 
is of depth at most K. For each v £ V , let 9(v) be the union 
of {«} and the set of all ancestors of v in T, then (V, Et, r, 9) 
is a tree decomposition of G of depth at most K and width 
at most K. □ 

As a matter of fact, the converse of Theorem [9] holds as 
well. 

Proposition 10. If G has a tree decomposition of width 
< A and depth < B, then the length of any simple path of 
G is bounded by {A + 2) B + J2i< l<B ( A + 2 Y ■ 

So generally speaking, for a class of graphs, all simple 
paths are length-bounded for each graph in the class iff there 
is a tree decomposition of bounded depth and width for each 
graph in the class. 

Now we describe how to encode labeled graphs by labeled 
trees using tree decompositions. 

Let G = (V,E,£) £ Qk be a EG-labeled graph, and 
T = (U, F, r, 9) be a tree decomposition of G with width 
K and depth at most K. Remember that each 9(u) is 
represented as a sequence of exactly K + l vertices, and 
[K + l] ={0,...,K}. Define 

Eg, k := (E G ) X+1 x 2 [K+1]2 x 2 [K+1]2 x 2 [K+1]2 . 

We transform T = (U, F, r, 9) into a EG,K-labeled tree T 1 = 
(U, F, r, rj), which encodes in a uniform way the information 
about G (including edge relations and vertex labels). r\ : 
U — !> Tig,k is defined as follows. Let 9(u) = Vo ■ ■ ■ Vk, then 
r](u) = (£(v ) . . .£(v K ), A), where A = (Ai, A 2 , A 3 ), 



• Ai = | < i,j < K,vi= vj}, 

• Aa = {(i,jf) \ 0<i,j<K,{vi,Vj}£E}, 

• If u = r, then A3 = 0, otherwise let u be the parent 
of u in T and 9(u) = v' •■ ■ v' K , then A3 = {(i,j) | < 
i,j <K,Vi = Vj}. 

5.3 Well-quasi-ordering for data trees 

The encoding of labeled graphs into labeled trees estab- 
lishes a connection between the wqo < of labeled trees and 
the induced subgraph relation (C) of labeled graphs. 

Proposition 11. Let Gi,G2 be two T,c-labeled graphs 
with tree-width bounded by K, and T\,Ti be two tree decom- 
positions of width K of resp. Gi,G2, then the two Yjg,k- 
labeled trees T[,T2 obtained from Ti , T2 satisfy that: IfT[ < 
Ti, then Gi C G 2 . 

Proof. Let G % = (Vi, T, = (U l ,F l ,r l ,9 l ) and 
Ti = (Ui,Fi,n,r)i)(i = 1,2). Suppose that T{ < %. Then 
there is an injective mapping <f> from U\ toU? preserving the 
root, the parent-child relation and the node-labels. 

Define an injective mapping n : Vi — > V% as follows: 

For t) £ Vi, select some u £ Ui such that 9i(u) = 
no . • . vk and v — Vi for some i. Writing #2(^(11)) = 
v . . . v' K , we let 7r(«) = Wj. 

First we show that ir does not depend upon the choice of 
i such that v = v%, neither on the choice of u £ Ui such 
that v £ 9i(u). The former holds because 771(11) = 772(^(14)) 
(and in particular the component Ai is preserved), hence if 
Vi — Vj , then we also have v t = Vj . 

For the latter, notice that the A3 component of 771 (u) = 
772(0(1/)) is preserved, hence the choice of u or of its father is 
irrelevant. Now, the set {u £ Ui | v £ 9\ (u)} is a connected 
subgraph of T\ by definition of tree decomposition, hence 7r 
does not depend upon the choice of u £ U\. 

Now we show that 7r is injective. Let V2 be a vertex of G2. 
Because of the preservation of Ai , no two different vertices 
v,v' of Gi with v,v' £ 9(u) can satisfy ir(v) = n(v') — V2- 
Because of the preservation of A3, no two different vertices 
v, v' with v £ 9(u) and v' £ 9(u') with u father of u can 
satisfy 7r(7j) = tt(v') — vi. Again, as the set {u £ Ui \ 
vi £ 9(u)} is a connected subgraph of T2, it means that 7r is 
injective. 

We finish the proof by showing that ti preserves the node- 
labels and edge relations. 

Node- label preservation: Suppose tt(v) = v'. Then 
there exists some u £ Ui such that 9i (u) = Vq • ■ • v^, v — Vi 
for some i, 92{4>( u )) = ^0 ■ • ■ v 'ky an d v' = v[. Since 771(11) = 
772 {(t>{u)), li(wo) • ..£i(v k ) = £ 2 (v' ) ...£ 2 (v' k ),it follows that 
h(v) = £i(vi) = £ 2 (vl) = £ 2 (v'). 

Edge relation preservation: We show that {v, w} £ E\ 
iff {tt(v), n(w)} £ E 2 for any v, w £ Vi. 

If {v,w} £ Ei, there exists u £ Ui such that 9\(u) — 
vq ■ ■ ■ vt, v = Vi and w = Vj for some So £ A2(it) 

in Ti'. Then (i,j) £ Aa (<£(«)). Let S 2 {<f){u)) =v' ---v' k , then 
{v'i,v'j} £ E 2 . Consequently {n(v), tt(w)} — {v^,Vj} £ E 2 . 

If {ir(v), ir(w)} £ E 2 , then there exists u £ U 2 such that 
tv(v),-k(w) £ 9 2 (u). Without loss of generality, we can 
choose u at minimal depth such that 7r(«), n(w) £ 9 2 (u'). It 
means that for instance, the father u" of v! satisfies 7r(ii) ^ 
9 2 (u"). Since U 2 = {it'" £ U 2 \ tt(v) £ 9 2 (u'")} is connected, 



it means that JJ'2 is entirely contained in the subtree rooted 
at u . By contradiction, if there does not exist u 6 Ui such 
that <f>{u) = u' , then </>(Ui) n Ua = 0. On the other hand, 
according to the definition of ir, there is u £ Ui such that 
v G 0i (it) and 7r(i>) G 2 (0(«)). So 0(u) G <£(*7i) n L^, a 
contradiction. Thus there is w G £/i such that u' = <j>(u). 
Let #i(u) = «o . - . Wfe and 62(1*') = Wq . . . v' h , by injectivity of 
7r, we have 7r(u) = 1^, 7r(io) = Wj, v = Vi, w = Wj for some 
i,j. Then G A2(u') = A2(m), which proves that {v, w} 
is an edge of Gi. □ 

Now we are ready to show that ■< is a wqo for 7b,k- 
Let To,Ti, . . . be an infinite sequence of data trees from 
Tb,k- Consider the infinite sequence of Eg, if- labeled trees 
Tq,T{, . . . obtained from the tree decompositions (with width 
K and depth at most K) of graphs G e (T ), G e (Ti), . . . . Then 
there are i,j:i<j such that T[ < Tj, because < is a wqo 
for labeled trees of depth at most K. So G(T») C G{T 3 ) 
from Proposition 1111 and Tj < Tj from Proposition [5] We 
thus prove following theorem. 

Theorem 12. X is a well-quasi-ordenng over Tb,k ■ 

6. VERIFICATION OF TEMPORAL PROP- 
ERTIES 

Until now we considered only two properties for static 
analysis: termination and pattern reachability. (Non-Reach- 
ability of a DTP can be expressed easily in Tree-LTL [5], 
which corresponds roughly to linear time temporal logics 
where atomic propositions are We show in this sec- 

tion that allowing for runs of unbounded length makes the 
validation of (even simple) Tree-LTL properties undecidable, 
even without data: 

Proposition 13. It is undecidable whether a TPRS with 
initial (data-free) tree To satisfies a given Tree-LTL formula. 

The fact that Tree-LTL is undecidable does not disallow 
us to verify quite complicated properties. We show on the 
Play.com example how to proceed: it suffices to encode in 
the system the property we want to check with additional 
tags and check for pattern reachability. For example, sup- 
pose that we want to verify whether a customer can add 
some product after the bill was processed. For that, we add 
new tags 1, 2, 3 to the alphabet, and we add one child to 
Play.com labeled by 1 in the initial tree. We add one rule 
which checks that the additional tag is 1 and selects one cart 
in the payment mode. The outcome of the rule is to change 
the tag from 1 to 2, and to append # as child of the selected 
cart. We add another rule which checks that the tag is 2. 
The outcome of the rule is to change the tag from 2 to 3, 
and to append a new product with one item in the inven- 
tory below PCatalog with a special marker # as brother of 
PId. Now one can reach in the new system a tree with a 
cart marked # and with a product with PId X such that 
there exist a Product with PId X in the PCatalog which 
is marked by # iff a customer can add some product after 
the bill was generated in the original system. The former 
property is a pattern reachability problem, which we proved 
to be decidable. 

4 Such formulas use actually free variables in patterns, which 
are then quantified universally. This is consistent with the 
approach of testing whether a model satisfies the negation 
of a formula. 



7. BOUNDED MODEL-CHECKING DTPRS 
AND RECURSION-FREE GAXML 

Recall that [5] shows that the largest decidable fragment 
of GAXML that can be model-checked w.r.t. Tree-LTL prop- 
erties is the recursion-free one. Absence of recursion in 
GAXML roughly means (1) disallowing recursive DTDs (as 
we do here) and (2) imposing that no function is called more 
than once, on any execution path. On the other hand, one 
can use negated DTPs in this fragment. 

In this section we consider bounded model checking for 
DTPRS: Given a DTPRS (TZ, A), a set of initial trees Init, 
a DTP P and a bound N (encoded in unary) we ask whether 
there is some To satisfying Init and some T s.t. P matches 

< N 

T and To =— > T. We show the following result: 

Theorem 14. Bounded model-checking for DTPRS is 
NexpTime-complete. 

Theorem 1 141 can be actually extended to bounded model- 
checking Tree-LTL properties. Bounded model-checking of a 
Tree-LTL formula <p with a bound N is the problem checking 
whether a counter-example for tp holds in < iV-steps. For 
instance, bounded model-checking for G^P with a bound 
N is to check whether the DTP P can be reached in < N 
steps. 

For the upper bound we show how to encode a DTPRS 
(TZ, A) with the given bound N into a recursion-free GAXML 
system, and use the upper bound provided by [5] . We recall 
that [5] provides a simply exponential bound in the num- 
ber of transition steps of the recursion-free GAXML sys- 
tem. Also notice that the DTPRS in Theorem [TJ are not 
supposed to be positive - the lower bound relies on negations 
of DTPs. 

The basic idea of the reduction from bounded model- 
checking DTPRS to recursion-free GAXML is the following: 
we "guess" on-the fly the rules Ri, R2, ■ ■ ■ , Rm, M < N, that 
are applied on a successful path of the DTPRS, and use func- 
tion labels for pinpointing the nodes used by the matching 
of the corresponding locators. Suppose that nodes of loca- 
tors have identifiers. A node having a child (leaf) labeled 
by the function call \(i,Ri,w) with i < M, Ri G TZ and w 
an identifier within the locator Li of rule -Ri, is "guessed" 
to correspond to node w in the matching of the locator Li 
when applying rule Ri. Notice that node can have several 
function calls (i, Ri,v), (j, Rj ,w) attached to it (but then, 
i ^ j, since the matching of Li should be injective). 

We use the DTD in the invariant A in order to ensure 
that (1) each of the (polynomially many) function labels 
\(i,Ri,w) occurs at most once at any time point, and (2) 
we use GAXML call/return guards for ensuring that calls 
related to rule Ri are only performed after all function calls 
(j, Rj,v) with j < i, have been completed, for each i. Check- 
ing (2) is done by forbidding the presence of function calls 
\(j,Rj,v) and ?(j,Rj,v) with j < i, whenever (i,Ri,w) is 
called. Similarly, when the result of a call ?(i,Ri,w) is re- 
turned, we forbid that it contains some label \(j,Rj,v) or 
7(j,Rj,v) with j < i. 

Applying rule Ri means calling all functions \(i,Ri,w) 
one-by-one (say in DFS order) and performing the associ- 
ated actions. When we call the first function with index i, its 
guard also checks that the locator Li was properly guessed. 
A rename action must be simulated, since GAXML has no 
renaming facility: a call \(i,Ri,w) with w labeled by rent, 



in Li has as effect to attach label 6, as a child (leaf) of node 
w. Checking which is the current label of a node is done 
by using negative guards: we look for a child b t that has no 
sibling Cfc with k > i. A delete action must be simulated, 
too, since GAXML has no deletion. A call \(i, Ri,w) with w 
labeled by del in Li has as effect to return a node with tag 
del. This might be syntactically inconsistent if the current 
node is a data node. However, standard encoding tricks can 
remedy this problem. For simplicity, let us assume that the 
tag del is always appended as a sibling of the node that is 
supposed to be deleted in DTPRS. 

Finally, the append action is done as in GAXML, by per- 
forming the query and attaching the result. Here, we need 
to take care about the nodes that added via a forest F 6 F, 
resp. a query. For such nodes, we must "guess" some at- 
tached function calls. In both cases we use external GAXML 
functions: for example, we can simulate the addition of an 
annotated copy of F via an external call, and use A for 
checking that the right F was added. For query answers we 
may split a query Q in polynomially many copies, where for 
some of the copies, the head has attached external function 
calls. Their role is to generate (sets) of functions of type 
R, v) attached to nodes in head. 

The last point is that we need to adapt the GAXML upper 
bound in order to take care of nodes marked by del: this 
can be done e.g. by extending the notion of matching tree 
patterns in such a way that none of the nodes to which 
tree patterns are matched, nor their ancestors, are allowed 
to have a child labeled by the tag del. It can be easily 
checked that the complexity checking a Tree-LTL formula for 
recursion-free GAXML still holds with this extended notion 
of matching. The reason is that the proof is based on small 
models obtained by taking tree prefixes of the bigger model. 
Obviously, the absence of children with tag del is preserved 
by taking tree prefixes. 

The lower bound is adapted from the 2-NexpTime lower 
bound proof for recursion-free GAXML. We only recall the 
rough idea here. 

The main ingredient of the proof is to create/check lists 
of length 2 n . This is done using data values, similarly as 
in the proof of Theorem [3] A "list" of length k corresponds 
to a tree of depth 2, where each node at depth one has 
2 children, with distinct data values di,di+i. If each data 
value di occurs twice (except for di and dk+i, which occur 
only once) we get a linear order, i.e. a list. Using n queries 
we can compute n steps of transitive closure and thus verify 
that k = 2 n . Obviously, this suffices for encoding a (2™ x 2 n ) 
tableau representing a computation of a 2™-time bounded 
TM. Details are fairly easy to complete. 

8. CONCLUSION 

In this paper, we defined a rich class of systems describ- 
ing active documents, possibly with recursive calls. We 
show that this class of systems is easy to use and powerful, 
demonstrating it on the Mailorder example. We studied the 
boundary of decidability for different properties and restric- 
tions of the active documents. Namely, we show that ter- 
mination from one document and pattern reachability from 
a set of documents are both decidable for positive DTPRSs, 
which are DTPRSs where the DTD is non-recursive, there 
is no negative guard or data invariants, and simple paths are 
bounded. We showed that without these restrictions, the 
problem is undecidable. We also show that the problem is 



undecidable for more complex properties (Tree LTL or ter- 
mination from a set of active documents). Nevertheless, we 
also demonstrate on the Mailorder example that one can 
find bugs with our method. 

Compared with GAXML [5], the respective restrictions 
used to get decidability (positiveness and non-recursion) are 
incomparable. We showed however a reduction from (not 
necessarily positive) rewriting-length bounded DTPRS to 
recursion-free GAXML. 

Considering further work it seems possible to get decid- 
ability results for another (incomparable) class of systems, 
namely DTPRSs whose new data variables do not get mutu- 
ally distinct fresh values, but possibly arbitrary data values. 

9. REFERENCES 

[1] P. Abdulla, K. Cerans, B. Jonsson, and Y.-K. Tsay. 
General decidability theorems for infinite-state 
systems. In LICS'96, pages 313-321. IEEE, 1996. 

[2] P. A. Abdulla, A. Bouajjani, J. Cederberg, F. Haziza, 
and A. Rezine. Monotonic abstraction for programs 
with dynamic memory heaps. In CAV'08, volume 5123 
of LNCS, pages 341-354. Springer, 2008. 

[3] S. Abiteboul, O. Benjelloun, and T. Milo. Positive 
Active XML. In PODS'04, pages 35-45. ACM, 2004. 

[4] S. Abiteboul, O. Benjelloun, and T. Milo. The Active 
XML project: an overview. VLDB J., 
17(5) =1019-1040, 2008. 

[5] S. Abiteboul, L. Segoufin, and V. Vianu. Static 

analysis of active XML systems. In PODS '08, pages 
221-230, New York, NY, USA, 2008. ACM. 

[6] A. Blumensath and B. Courcelle. On the monadic 
second-order transduction hierarchy. HAL Archive, 
2009. http://hal.archives-ouvertes.fr/hal-00287223/fr. 

[7] P. Bouyer, N. Markey, J. Ouaknine, P. Schnoebelen, 
and J. Worrell. On termination for faulty channel 
machines. In STACS'08, volume 1 of Leibniz 
International Proceedings in Informatics, pages 
121-132, 2008. 

[8] C. David. Complexity of data tree patterns over XML 

documents. In MFCS '08, pages 278-289, Berlin, 

Heidelberg, 2008. Springer- Verlag. 
[9] A. Deutsch, R. Hull, F. Patrizi, and V. Vianu. 

Automatic verification of data-centric business 

processes. In ICDT'09, pages 252-267. ACM, 2009. 
[10] A. Deutsch, L. Sui, and V. Vianu. Specification and 

verification of data-driven web applications. J. 

Comput. Syst. Set., 73(3):442-474, 2007. 
[11] A. Deutsch, L. Sui, V. Vianu, and D. Zhou. 

Verification of communicating data-driven web 

services. In PODS'06, pages 90-99. ACM, 2006. 
[12] A. Deutsch and V. Vianu. WAVE: Automatic 

verification of data-driven web services. IEEE Data 

Eng. Bull., 31(3):35-39, 2008. 
[13] A. Finkel and P. Schnoebelen. Well-structured 

transition systems everywhere! Theor. Comput. Sci., 

256(1-2) :63-92, 2001. 
[14] X. Fu, T. Bultan, and J. Su. Conversation protocols: a 

formalism for specification and verification of reactive 

electronic services. Theor. Comput. Sci., 

328(1-2) :19-37, 2004. 
[15] B. Genest, A. Muscholl, O. Serre, and M. Zeitoun. 

Tree pattern rewriting systems. In ATVA '08, pages 



332-346. Springer, 2008. 
[16] R. Hull. Artifact-centric business process models: 

Brief survey of research results and challenges. In 

OTM '08: Proceedings of the OTM 2008 Confederated 

International Conferences., pages 1152-1163. 

Springer- Verlag, 2008. 
[17] R. Hull, M. Benedikt, V. Christophides, and 

S. Jianwen. E-services: a look behind the curtain. In 

PODS'03, pages 1-14. ACM, 2003. 
[18] R. Mayr. Undecidable problems in unreliable 

computations. Theor. Comput. Sci., 297(1-3) :337-354, 

2003. 

[19] J. Nesetfil and P. O. de Mendez. Tree-depth, subgraph 
coloring and homomorphism bounds. Eur. J. Comb., 
27(6):1022-1041, 2006. 

[20] J. Wang and A. Kumar. A framework for 

document-driven workflow systems. In Business 
Process Management, pages 285-301, 2005. 



APPENDIX 

Proof of Proposition 2: 

In both cases we encode a 2-counter machine with counters 
a, b. In the first one, a configuration (q, n a , nj,) G Q x N x N 
is encoded by a tree with root labeled q and two subtrees, 
one of the form a™ a a$, and the other of the form b nb b$. 
E.g. a zero test on the first counter corresponds to checking 
that the root has a child labeled a$. Decrementing the first 
counter in state q (and going to state q') is done using the 
locator [q](— [a]([a$])), where the additional labels are: ren q i 
for the root, ren a% for the a-node and del for the a$-node. 

With non-recursive DTDs we can encode a configuration 
(q,n a ,rib) by a tree of depth one, with root labeled by q, 
and n a (rib, resp.) leaves labeled by a (6, resp.). The zero 
test is now done using a negative guard (e.g. "no a-leaf ') or 
a negative invariant. In the latter case we split a transition 
in 2 steps: first we relabel the root by a transition from that 
state; second, we perform the corresponding rewriting as be- 
fore. The invariant states that whenever the root is labeled 
by a transition corresponding to a zero test of counter c, the 
tree has no c-leaf (c G {a, b}). □ 

Proof of Proposition 7: 

It is sufficient to consider min(Prerf(t T)), the set of min- 
imal elements wrt. X in Pred(f T), for each tree T G Tb,k- 
Fix a rule R = (L, G, Q, T, %) witn 

L = (Vl,El, rooth ,Il,Tl, condh , II ) 

such that £' L attaches additional labels {append, ren a , del}, 
Q = {Qi, ... , Qm} (Qi = body i -v-* headi), and T = {Ti, F n }. 
Let Ti = {Vi,Ex,rooti,ix) G min(Pred(t T)), then 

3T' such that Ti T wrt. some <j) : L -> Ti 
and T <T' via some tp : T -> T'. (*) 

In the following, we show that the size of Ti (number of 
nodes) is bounded by the following constant (we actually 
show even more, by exhibiting Ti satisfying A): 

B 2 ((|E| + 1) max(A)) s (\L\ + \G\ + |T| max \Q^ . 

Thus a finite basis, which is a finite subset of min(Pred(t 
T)), is computable. 

Let T' = (V',E',root',f). Then V' consists of four dis- 
joint subsets, 

• V[ = {4>( v ) I v G Vl,v not labeled by del}, 

• V{ = node^ 1 (V{ ), where node -1 (V{) is the set of nodes 
w G Vi\ 4>(Vl) such that the lowest ancestor of w in 
<t>{V L ) is in VI. 

• V3 contains distinct copies of Fj, excluding the leaves 
labeled by those Qi, 

• V4 contains distinct copies of the nodes of the forest 
Qi(Ti), one for each node labeled by Qi in each copy 
ofiv 

The node set of Ti consists of V{, V 2 ', ^3 = {(p(v) \ v G 
Vl,v labeled by del}, and Va = node' 1 (V3). 

Now we consider an upper bound on the size of T\ that 
are sufficient to allow Ti satisfying (*), 

• To guarantee the matching cj> from L to Ti: 

The nodes in V{ U Vz = 4>{Vl) and all their ancestors 
in Ti are sufficient. 

Note that in L, ancestor relations || may occur, so the 



inclusion of the ancestors of nodes in V{ U V3 = 4>(Vl) 

is necessary. 

Size: B\(j>(V L )\ = B\L\; 

• To witness that G is satisfied over T\ wrt. (f>: 

G is a positive Boolean combination of DTPs. To wit- 
ness the satisfaction of each DTP Pi in G, we need keep 
a matching <f>i from Pi to Ti and all the ancestors of 
nodes of 4>i(Pi) m Ti- 
Size: B\G\; 

• To guarantee that T <T'\ 

— Keep (Vi U V2) fl iP{Vt) and all their ancestors in 
Ti, 

— At most |T| instantiations of headi on Ti by match- 
ings from body i to Ti wrt. (f> are sufficient. The an- 
cestors of all the nodes of Ti in these instantiations 
should be preserved as well. 

Size: B\T\ + B\T\\bodyi\ < B\T\ max|Qi|. 

i 

• Finally, to satisfy the DTD in A, T\ should be com- 
pleted into a data tree of size at most (c.f. [5]) 

J B-(|(E| + l)max(A)) B |T 1 |, 

where max(A) is the maximum integer used in the def- 
inition of DTD in A. 

Thus a sufficient upper bound for the size of Ti is 
B 2 ((|E| + 1) max(A)) B (|T| + \G\ + \T\ max |Qi|) . 

□ 

Proof of Proposition 10. 

Let G = (V, E) and T = (W, F, r, 6) be a tree decomposi- 
tion of G of width at most A and depth at most B. 

Let P — vi ■ ■ ■ Vn be a path in G, and wi ■ • ■ w„ be a trace 
of P in T such that v% G 0(uii), Wi = tUi+i or there is a path 
in T from w% to roj+i such that for each w 7^ lOi+i on the 
path, Vi G 9(w). 

Because all bags are of size at most A + 1, each bag can 
only occur at most A + 1 times on the sequence Wi ■ • ■ w„. 

Let Bo be the minimal depth of Wi's. Then there is only 
one bag at depth Bo, say w, occurring on the sequence 
wi ■ ■ ■ w n . 

Let w il , ■ ■ ■ , Wi t (I < A + 1, ij < ij+i) be all the occur- 
rences of w on the sequence wi ■ ■ ■ w n . Then all the bags 
on each sub sequence u>i J+ iu>i J +2 • • • Wi j+1 -i is at depth no 
less than Bo + 1. By induction hypothesis, each subsequence 
w ij+1 Wi j+ 2 ■ ■ ■ w ij+1 -i is of length at most 

{A + 2) B - B °- 1 + J2 ( A + 2 )\ 

l<i<B-B -l 

thus 

n < l + (l + l) ( (A + 2) B - B °- l + E (.4 + 2) 1 

V \<i<B-B -l 1 

< (A + 2) (l + {A + 2) B - 1 + E (A + 2y) 

\ 1<!<B-1 I 

= (A + 2) B + J2 (A + 2)\ 

l<i<B 

□ 



Proof of Proposition 13: 

We give a reduction from the exact reachability problem, 
i.e. checking whether a tree T2 can be reached exactly from a 
tree Ti via a TPRS TZ. This problem was shown to be unde- 
cidable in 15 by a reduction from the reachability problem 
on reset Petri netfl 

We reduce the exact reachability problem for TZ to check- 
ing the Tree-LTL formula ip = G(Pi -> FP 2 ) (Pi and P 2 
are DTPs) for a TPRS TZ' and initial tree T . 

Let E be the set of tags of TZ, and let E m be a disjoint 
copy of E. We will use five new tags root, a, /?, 7, 8 ^ T, 
thus V = T U E m U {root, a, /3, 7, 8). 

The starting tree To consists of a root with one child la- 
beled a, and one other child tree that equals Ti. 

The TPRS TZ' consists of TZ (adapted in order to take the 
additional root into account), plus two kinds of new rules 
(with empty guards): 




(root) 



® T 2 \JI) T 2 m @ 

In the rule above, T™ means that we use the tag copy E m 
for T2. Technically, the rule renames a by f3, renames each 
tag a € E in T 2 by a m £ E m , and appends a new node with 
tag 7. 

The second type of additional rules is the following (here, 
a £ E parametrizes the rule): 



(root) 




©' a © © 

Technically, the rule above deletes node a and the subtree 

rooted at o, and rename 7 by 8. 

Finally, the Tree-LTL property to be checked on (TZ' , A') 
and initial tree To is (f = G(Pi — > FP2), where Pi = 
[root] ([7]) and P2 = [root] ( [8] ) . It is easy to see that Tl — ► 
T2 via TZ iff TZ' does not satisfy tp from the initial tree To: 
TP Pi can be generated only if from Ti we can reach a tree 
via TZ, that contains T2 as prefix. But then, TP P2 cannot 
be generated only if from Tl we reach exactly T2 (cf. second 
rule). □ 



5 Although the TPRS model used in T5] is slightly more 
general than the present model, it is easily seen that in the 
reduction in [15] we use the TPRS model presented here. 



